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The Bethe-Salpeter equation (BSE) is currently the state of the art in the description of neutral electron exci¬ 
tations in both solids and large finite systems. It is capable of accurately treating charge-transfer excitations that 
present difficulties for simpler approaches. We present a local basis set formulation of the BSE for molecules 
where the optical spectrum is computed with the iterative Haydock recursion scheme, leading to a low computa¬ 
tional complexity and memory footprint. Using a variant of the algorithm we can go beyond the Tamm-Dancoff 
approximation (TDA). We rederive the recursion relations for general matrix elements of a resolvent, show how 
they translate into continued fractions, and study the convergence of the method with the number of recursion 
coefficients and the role of different terminators. Due to the locality of the basis functions the computational 
cost of each iteration scales asymptotically as 0(N^] with the number of atoms, while the number of iterations 
is typically much lower than the size of the underlying electron-hole basis. In practice we see that, even for sys¬ 
tems with thousands of orbitals, the runtime will be dominated by the O(N^) operation of applying the Coulomb 
kernel in the atomic orbital representation 


I. INTRODUCTION 

Ab initio simulation of optical spectra is an essential tool 
in the study of excited state electronic properties of solids, 
molecules and nanostructures. For finite systems time- 
dependent density functional theory (TDDFT^based on local 
or semi local functionals is widely used. However, TDDFT 
fails in certain cases, notably for charge transfer excitation^ 
which are essential in, e.g., photovoltaic applications. An al¬ 
ternative to TDDFT is Hedin’s GW approximatiorP followed 
by the solution of the Bethe-Salpeter equation (BSEj^. Based 
on many-body perturbation theoty®^, the GVT/BSE method 
is a more systematic approach than TDDET, and it has been 
shown to give a qualitatively correct description of excitonic 
effects in solidJ®! and charge transfer excitation^^. 

The Bethe-Salpeter equation is a Dyson-like equation for 
the two-particle Green’s function, or equivalently for the four- 
point polarizabilitjGs! Within the field of electronic struc¬ 
ture theory, developments of th e BSE can be traced back 
to the beginning of sixtieJ^EIE^, with the first ab initio im¬ 
plementations appearing a couple of decades lateJi^EIl The 
GW/BSE method h as be en implemented using plane waves 
and real space grids j'° l *^l t^, linear combination of atomic or¬ 
bitals (LCAOj^^® and within the ELAPW frameworliP^. In 
practice, the standard way of solving the BSE is by convert¬ 
ing it to an effective eigenvalue problem in a particle-hole ba¬ 
sis. Since the size of the particle-hole basis scales quadrati- 
cally with the number of atoms N, a straightforward diagonal- 
ization of the BSE Hamiltonian will scale like O(N^). This 
very steep scaling makes it difficult to treat large scale sys¬ 
tems like nanostructures and realistic models of organic pho¬ 
tovoltaic devices. Eor such systems an improved scaling with 
the number of atoms would be highly beneficial. 

Avoiding an explicit diagonalization of the BSE Hamilto¬ 
nian can be done by using an iterative method to obtain a few 
low-lying transitions (e.g. the Davidsson methocP^^, or to 


directly aim for the spectrum, which can be done frequen cy by 
frequency using for example the GMRES methotP^l ^^ l ^ ^ lpr for 
the full spectrum with the Haydock recursion schemd^®!!!!! 
Another option is to go over to the time doma in and solve the 
equations of motion by time propagatiorP^Hl. These methods 
only require matrix-vector products to be performed, and as¬ 
suming that the number of iterations, or time steps, is much 
smaller than the size of the particle-hole basis, the asymptotic 
scaling will be However, setting up the BSE Hamilto¬ 

nian explicitly will still have the cost of 0(N^), and to avoid 
this, the matrix-vector products need to be performed on the 
fly, without explicitly constructing the matrix. 

Benedict and Shirley made use of the Haydock recursion 
method to compute optical spectra in the Tamm-Dancoff ap¬ 
proximation (TDA) without actually computing the whole 
BSE Hamiltoniarl^. This was achieved by using, in addition 
to the particle-hole basis, a real space grid product basis \x,y}, 
in which the screened direct Coulomb interaction is diagonal 
(the exchange term is sparse in this representation). The scal¬ 
ing of the algorithm was reported to be 0{N*) with the number 
of atoms, however, a more careful analysis shows that it can be 
made to scale like O(N^) by a proper ordering of the loop^. 

This favorable scaling is heavily based on the use of a real- 
space representation for the particle-hole states. Similar gains 
can be obtained with the use of LCAO basis sets, where the 
same asymptotic scaling can be obtained by making use of 
the sparsity in both direct and exchange Coulomb interaction 
terms. It should be mentioned that by using additional as¬ 
sumptions of locality, which implies screening away Coulomb 
matrix elements between basis functions that are spatially far 
from each other, one could even achieve linear scalin^^, how¬ 
ever, the BSE has so far not been treated with these meth¬ 
ods. Another linear scaling approach to many-body theory 
methods has recently been published by Baer and coworkers 
that make use of stochastic wave functions together with time 
propagatioiP3®. 
















2 


In the present publication, we will not venture into the 
realm of linear scaling but rather make use of the more stan¬ 
dard iterative methods that, together with locality, lead to cu¬ 
bic scaling with the number of atoms. We present an iterative 
algorithm to obtain the BSE spectrum for molecules, making 
use of localized basis sets both for orbitals and products of or¬ 
bitals. To go beyond the TDA a pseudo-Hermitian version the 
Hay dock recursion schem^^is used. We derive the recursion 
relations for general matrix elements of a resolvent and show 
how they translate into continued fractions. Our method has 
been interfaced to the SESTA codd® which is widely used for 
ground state density functional theory calculations (as an al¬ 
ternative, we can do all-electron calculation using numerical 
orbitals in an in-house implementation). Eor the case of the 
benzene molecule, as a prototypical example, we present a 
detailed study of the convergence properties of the iterative 
method, both within the Tamm-Dancoff approximation and 
for the full BSE. In particular, we study the effect of differ¬ 
ent termination schemes. Eurthermore, for the sake of clar¬ 
ity, we provide a detailed account of the BSE method itself 
using our notation. Our algorithm scales asymptotically like 
0{N^) with the number of atoms and uses 0{N^) memory. We 
present proof of principle calculations of our implementation, 
where the runtime is seen to be dominated by the 0{N^) scal¬ 
ing operations for systems up to several thousand orbitals, and 
discuss some of the bottlenecks and possible improvements of 
the scheme. 


II. THEORY 


A. Quasiparticles with the GW approximation 


Before the BSE can be set up and solved, the quasiparticle 
energies must be obtained from a preceding GW calculatiorP. 
Since the detail s of o ur GW implementation have been pub¬ 
lished elsewherd^SEI vve will here only give a brief summary 
of the method. The poles of the one-particle Green’s func¬ 
tion G for an A-electron system occur at the ground and ex¬ 
cited states of the corresponding N+l and N-1 systems, that 
is at the electron addition and removal energies. Hedin’s GW 
approximation connects the (irreducible) polarizability P, the 
non-interacting and interacting Green’s functions (G° and G), 
the screened interaction W, and the self energy E in a set of 
closed equations 


P{r. 




G^(r,r',a> - aj')G^(r',r,a)')dto', (1) 

W(r,r',a>) = v(r,r')+ 

J" v(r,r2)P(r2,r3,w)W(r3,r',w)c/V2t/V3, (2) 


X(r,r',a ))-— ( G°(r,r',CLi')W(r,r',a) - ai')daj', (3) 
Ztt J 

G(r,r',oj) ^G^(r,r',aj)+ 


f 


G“(r, r 2 , cu)S(r 2 , r^, co)G(r 3 ,r', u)d^r 2 d^r 3 . 


( 4 ) 


In our implementation of the GW method the Green’s func¬ 
tion is expanded in a basis of numerical atomic orbitals (AO) 
of finite support [fair)] 

G(r,r',u.)= ^ /,(r)5„j,G„-i,(cu)5,,U*(/). (5) 

aa'hb' 

Here and in the following we explicitly write out the overlaps 
Sab - f fa(f)fb(f)d^r when they appear, the matrix quanti¬ 
ties Gah{(^) are always contravariant and the placement of the 
indices as subscripts or superscript is arbitrary. With this rep¬ 
resentation of the Green’s function G, we see that the polariz¬ 
ability (0 involves products of AOs fa{r)f^{r). These prod¬ 
ucts are expanded in an (au xiliar y) product basis {E^(r)) of 
localized numerical functionJ^^^ 

where the expansion coefficients and the product basis 
functions {F^{r)] are determined by numerically expanding 
the products around a common center and removing redun¬ 
dant functions by a diagonalization based procedure^^l Only 
overlapping pairs of orbitals are considered, making the ma¬ 
trix of expansion coefficients sparse when using AOs of lo¬ 
cal support. The indices {aa'bb'} will be reserved for atomic 
orbitals and {ju, v) for product functions of atomic orbitals 
in the following. Using the product basis, the polarizability 
P(r, r', a>) is represented similarly to the Green’s functions Q 

P(r,/,cj)^ Y^F,{r)S-l,P,.Aco)S-,^Fl{r'), (7) 

where the overlap of the product functions = 

f F*(r)Fy(r)d^r appears. Similarly, it can be seen from equa¬ 
tion (|^ that the matrix elements of the bare v and screened W 
Coulomb interaction must be expanded in the product basis, 
while the self-energy S is expanded in the AO basis. Eor finite 
systems both the {fair)} and the product basis {/^((r)) can be 
chosen as real. 

The frequency-dependent quantities like Gabi^d) and P/jviio) 
are represented on an even-spaced, real-axis, frequency grid 
via their corresponding spectral functions. An imaginary part 
of the energy is added in the Green’s function G°(w) and po¬ 
larizability P(a)), that is sufficient to ensure their smoothness 
on the chosen frequency grid. The convolutions of spectral 
functions implied by equations Q and ([^ are computed via 
fast Eourier transforms. Due to the the fast convolutions and 
the locality of the product basis set, the asymptotic scaling of 
the algorithm is O(N^) with the number of atoms aH 3. Ei- 
nally the Dyson equation Q is directly solved for each fre¬ 
quency to obtain Gabi(d)- The quasiparticle energies are poles 
in Gab(co) and can in certain cases be determined from inspec¬ 
tion of the density of states. This does not give the quasi- 
particle wave function, however. In this paper we adopt the 
standard way of proceeding and assume that the Kohn-SharrP^ 
(KS) or Hartree-Eock (HE) eigenfunctions that are used to 
construct the zeroth order Green’s function G°(w) are good 
approximations to the quasiparticle states, so that they can be 
kept fixed and only the quasiparticle energy corrected. We will 
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here only consider the so-called GqWq approximation where 
a single iteration of the GW equations is performed without 
self-consistency. We focus on the KS “starting point” in this 
subsection. The KS Hamiltonian is 


+ y“'(r) -H y"(r) H- y’‘"(r) 


( 8 ) 


with T the kinetic energy, y®’“(r) the external potential, y’^(r) 
the Hartree potential and y“(r) the exchange-correlation po¬ 
tential. The KS eigenfunctions are expanded in the AO basis 


</'/('•) = ^ Xiafair) , 


(9) 


where Xia = 2^' S are the eigenvectors of the general¬ 
ized eigenvalue problem 




( 10 ) 

b b 

If we additionally assume that interacting Green’s function G 
is diagonal in the KS eigenstates tf/iir), the Dyson equation Q 
reduces to a set of scalar equations 


Gn(a>) - 


1 


^KS 


(Xuico) - vfn 


( 11 ) 


where we have subtracted the exchange-correlation potential 
y“ in order to be able to work with the KS eigenvalues. The 
(assumed real) poles are then found by identifying the zeros 
of the denominator, either by a graphical solution if the full 
frequency-dependent quantities are available, or more com¬ 
monly, by an expansion of E„(aj) around which leads to 


= Si 


+ Z,(ReS,,(ef®)-V“) 


Z, = 1 - 


d Re Z,',(w) I 

do) I 


( 12 ) 


Since we have access to the full frequency dependence of 
the self energy we can use the graphical method, which in 
principle is more accurate and also has the advantage that 
problems with satellite peaks can be avoided^. For compar¬ 
ison purposes we will also make use of the simpler equation 

0. 


B. Optical spectra with the Bethe-Salpeter equation 

The directionally averaged absorption cross section of a 
molecule is given by 


o-(co) = 


(13) 


where a„„'(aj) is the dynamical dipole polarizability tensor 
given by 


The interacting density response function, or reducible polar¬ 
izability, ;^'(r, r',aj) is defined in the time domain as a func¬ 
tional derivative of the density with respect to the change of 
the external potential: ^(1,2) s Numbered bold in¬ 

dices, i - {r,, cTj, tj}, refer to space, spin, and time coordi¬ 
nates, whereas plain numbered indices contain space and spin, 
i = {r,, cr,). ;^f(l, 2) is a two-point quantity and it is directly 
connected to the non-interacting density response ;^f‘’(l, 2) in 
RPA or in TDDFT with semi-local functionalJ^. However, 
when the Hamiltonian becomes non-local in space (as in the 
case of TDHF, TDDFT with hybrid functionals or Hedin’s 
GW approximation) one must first find the retarded four-point 
polarizability L(l,2,3,4), and then obtain the two-point one 
using the relation;^:'(1,2) = L(l, 1^, 2,2) (see appendix [A|. 

The four-point polarizability L(l,2,3,4) satisfies the 
Bethe-Salpeter equation as derived in appendix]^ In the fre¬ 
quency domain the BSE can be written 

L(l,2,3,4|w) = L°(l,2,3,4|w) 

-H r^f(5678)L'’(l,2,5,6|w)K(5,6,7,8)L(7,8,3,4|w) 

(15) 

with L**) 1,2,3,41 w) the non-interacting four-point polariz¬ 
ability and 

K( 1,2,3,4) = v( 1,3)d( 1,2)5(3,4) - VT( 1,2)d( 1,3)d(2,4), 

(16) 

the BSE kernel. Already here the approximation has been 
made that the screened interaction VT(1,2) is independent of 
the frequency. Introducing an orthonormal two-particle basis 
\ij} that has the representation (1,2|!7) = i/f/(l)i/f*(2) in terms 
of the quasiparticle spin orbitals, we can expand L as 

L(l,2,3,4| w) = 2<l,2|0')L,-,-,,Kw)<k/|3,4) 


ij,kl 


ij.ki 


(17) 


with the matrix elements given by 

Lij,ki(oj)^ f 5(1234)iA*(l)tAj(2)L(l,2,3,4|w)iA^(3)iA;(4). 

(18) 

is expanded similarly. This leads to the matrix equation 




i'j',k'i' 


Equation has to be inverted for each frequency which is 
computationally cumbersome. Eortunately, with certain ap¬ 
proximations, it can be reformulated as an effective eigenvalue 
problem that only has to be solved once. To proceed with this 
we choose as our one-particle states the quasiparticle states 
in which the interacting Green’s function G is assumed to be 
diagonal. This leads to L? being diagonal in the two-particle 
basis 




SikSjiif-fj) 

oj - (e/ - €/) + iy' 


amm'(w) = - I d'^rd^r'r„,x(r,r',to)r'^, . 


(14) 


( 20 ) 
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where fi denotes the occupation number of spin orbital i/r, . We 
put the expression ( |20) i in equation ([T^, rearrange terms and 
get after some algebra 

Uj^kiioj) = [(cu + iy)6n'6ri' - (fk " //) > (21) 

where we introduced the frequency-independent BSE Hamil¬ 
tonian 

= y , 

ij,kl (22) 

= i^J - ^d^ikSji + {fi - fj)Ku.k, ■ 

The matrix i/®®® is non-Hermitian. If we solve for its right 
eigenvectors and eigenvalues 

= eAA), (23) 


and define expansion coefficients of the eigenvectors in terms 
of the the two-particle basis Afj = {ij\A}, we can obtain a spec¬ 
tral representation of the interacting polarizability as 




A^jS-,]X:ifk-f) 

a)-eA+iy 


(24) 


Here the overlap of the right eigenvectors S a a' = 
appears because the eigenvectors of a non-Hermitian eigen¬ 
value problem are generally not orthogonal. Using equations 
( [T4 | i, ( [T7] i and a resolution of the identity in the quasiparticle 
product states, we can rewrite amm'ioj) in terms of L as 

(CO) = y D'-;Lij^a(cS)D% , (25) 

ijki 


with the transition dipoles 

D"} = {ij\D,„} = J d(l)i/o’;(l)r,„,Pj(l) 




(26) 


d^rtf/*(r)r^i)/j(r). 


Here if/i(r) is the spatial part of (/',(1), and jic(cr) is the corre¬ 
sponding spin function. Here we denote the dipole operator 
as a ket, since in general a normal two-point operator A can 
be expanded as A = Yiij Ai-ii\i){i\ = Yiij In the preced¬ 

ing analysis spin is explicit in the orbitals. However, 
is not diagonal in a spin orbital basis. If it is diagonalized in 
the spin indices (see appendix]^, one singlet and three triplet 
product functions result, where the singlet one being the only 
one to have a non-vanishing transition dipole moment and so 
the one visible in the optical response. In the following we 
will suppress the spin indices and only work with the space 
quantities. Because of spin symmetry the coupling elements 
K are modified with the factor p!'- being 2 for the singlet and 
0 for the triplet 


Xki^- f d\d\'r.{r)Ur)W(r,r')ilfj{r')ry), 


and the transition dipoles for the singlet get an additional fac¬ 
tor of V2 (see appendix [ b|) 

jy^singie.^ V2 J d\r,(r)r^i)f j(r). (28) 

and the triplet transition dipole is zero. This means that the dy¬ 
namic dipole polarizability effectively gets an additional fac¬ 
tor of two for the singlet transition. Since we always consider 
the singlet for dipole transitions we can drop the ’’singlet” su¬ 
perscript and let D™ refer to equation 28 An important simpli¬ 
fication to the problem is that, due tome occupation factors, 
only particle-hole and hole-particle product states contribute 
to the polarizability (see appendix]^ and we can write the 
eigenfunctions of as 


IT) = y |vc)A^, + y |cv)A)\ 


(29) 


Here and in the following the indices {vv') will denote oc¬ 
cupied (valence), {cc') empty (conduction, unoccupied) and 
{ijkl] general molecular orbitals. Projecting the eigenvalue 
equation ( |2^ from the left with {vc| and {cv| we obtain a ma¬ 
trix equation with the following block structure 


Kyc+K^cyc 




-K, 


cv,v'c' 




VC,C'\’ 

-K, 


cv,c'v' 




(30) 


where = (ej - ei)6ik6ji. Using the symmetry proper¬ 
ties of the BSE kernel Kij^u - - K’^nj and of the non¬ 
interacting Hamiltonian = ~^'jiik’ write 


H' 


BSE 


-K* . 


d^vcy v' 

(HX+P.,yc'r 


(31) 


The second form ( (ST) is useful because it leads to computa¬ 
tional savings when explicitly setting up the matrix. In the 
Tamm-Dancoff approximation the off-diagonal blocks in the 
^BSE g j-jjg couplings between hole-particle and particle- 
hole states) are set to zero. This leads to two uncoupled Her- 
mitian eigenvalue equations for A^^ and A)!^. Due to the sym¬ 
metries displayed in equation ( [3T] i we see that the eigenvalues 
of the two blocks are related as e'f = the eigenvec¬ 

tors as A;!j, = Aye, where the superscript refers either to the 
(cv) or the {vc)-sector. Therefore, only one of the equations 
needs to be solved, for example the one for the the {vc)-sector; 
Zi-'c' ddPv'c'^i'c' ~ Using the fact that the eigenvectors 

are orthogonal for a Hermitian problem, the non-zero blocks 
of the the four-point polarizability are 


rXDA 

^veye' 




.1 


cj-eA 


A* 

+ 


iy ’ 


rTDA 

^cvp'v' 



A 


A* A A 
vc-^v'c' 


+ Q + iy 


(32) 


The TDA is a widely used approximation that, in addi¬ 
tion to the computational advantages, often provide good 
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agreement with experimental excitation energies for organic 
moleculeJ^DllIl At this point it is interesting to note the simi¬ 
larities of the BSE, TDDFT and time-dependent Hartree-Fock 
(TDHF). In TDDFT, although for semi-local functionals it is 
in principle sufficient to look at the response of the density, 
one can more generally look at the response of the density 
matrix as was done by Casid^. The resulting equations are 
very similar to the BSE, with the only difference that the GW 
eigenvalues are replaced by KS eigenvalues, and that the di¬ 
rect term is replaced by a TDDFT exchange-correlation ker¬ 
nel. For semi-local exchange-correlation functionals, and real 
orbitals, the resulting eigenvalue problem can be reduced to 
a Hermitian problem of half the size — the preferred for¬ 
mulation of TDDFT in quantum chemistry. However, when 
Hartree-Fock exchange is included (in hybrid functionals for 
example) the reduction to the Hermitian form does not sim¬ 
plify things quite as much, since one needs to take the square 
root of a full matrix which requires an additional diagonaliza- 
tion. The TDHF response equations have the same structure 
as the BSE ones with Hartree-Fock eigenvalues and an un¬ 
screened direct term. The Tamm-Dancoff approximation is 
also useful in TDDFT and TDHF. For TDHF with TDA one 
recovers the configuration interaction singles (CIS) equation. 

To set up and diagonalize the BSE Hamiltonian ( [3T] i is fea¬ 
sible only for systems with a few thousand of particle-hole 
pairs. For larger matrices an iterative procedure is essential 
both for memory and runtime requirements. In the follow¬ 
ing we describe how the the dynamical dipole polarizability 
tensor ( p5| ) can be computed with a Lanczos-type iterative 
method. 


1. Continued fraction expression for the BSE polarizability 


Using equations pT| ) and ( p5] ) we can rewrite a matrix el¬ 
ement of the dynamical dipole polarizability tensor (141 in a 
form involving the resolvent of the BSE Hamiltonian 


ijkl 

BSE 


(33) 




where 

\Dm) = ^ \ij){ij\Drf) , 

u 

u u 


(34) 


(the TDA-superscript since it will be clear from the context if 
the TDA is used or not). 

An attractive method of dealing with resolvents is the Hay- 
dock recursion schem^Sl where a diagonal matrix element 
of a resolvent is efficiently computed from Lanczos recur¬ 
sion coefficients by means of continued fractions. Recently 
it has been shown that also non-diagonal matrix elements 
of the res olven t can be computed from the same Lanczos 
coefficient^^nill! Usually the continued fraction representa¬ 
tion of the resolvent is derived by determinant relations. Here 
we present an alternative derivation that only uses the power 
series expansion of the resolvent and the orthogonality among 
the Lanczos vectors. The off-diagonal matrix elements come 
out naturally in this formulation, and it is straight-forwardly 
extendible to block Lanczos, two-sided Lanczos and pseudo- 
Hermitian Lanczos schemes. Our derivation also connects to 
the theory of relaxation functions, also known as the Mori 
projection technique, first introduced to describe the Laplace 
transformed correlation function of dynamical systems^ and 
later reformulated by Led^ in a form more closely related to 
the one we use here. 

We want to compute (!|(w - — a general matrix 

element of the resolvent of the Hermitian operator H, with the 
frequency a> in general a complex number. Let us define a 
frequency-dependent solution vector 

(37) 

where \j) - |y)/||yl| is the normalized \j}. The matrix element 
of the resolvent in terms of the solution vector ( |J7] ) reads 

{i\(co-Hr^\j)^{i\](co))-\\j\\. (38) 

Now we generate a set of orthonormal Lanczos vectors {|/„)) 
with the starting state |/o) = Ij), using the standard recursion 
relationJ^ 


^n+ll/«+l> = H\f„} - \f„)an - \fn-\)bn , (39) 

with the real coefficients = {fn\H\fn) and bn - (/n-i|H|/„). 
Next we expand the solution vector |y(w)) in the Lanczos basis 

|J(w)) = ^|/„)c„(cu), (40) 

n 

where the frequency dependent expansion coefficients c„(a;) 
are given by projection onto the basis 

Cn(w) = </„|;(w)) ■ (41) 


In the last equation {ij\Dm) refers to the singlet transition 
dipole in equation ( |28| l, and we denote the occupation differ¬ 
ence matrix by 

Fij,M = Hi - fj)6ik6ji (35) 

In the Tamm-Dancoff approximation we only consider (vc) 
states which means that the transition dipoles become 

\Djn^) = = X ’ (36) 


The expansion coefficients c„(w) contain the information nec¬ 
essary to compute the sought matrix elements of the resolvent. 
The diagonal matrix element is especially simple (remember¬ 
ing that |/o> = Ij)) 

{j\{u - HH\j) - <JlJ(w)> ■ \\jf = co(tu) ■ \\jf , ( 42 ) 

that is, only the zero-th coefficient co(co) is needed. 

In the original Haydock recursion scheme only diagonal 
matrix element were computed. For our purposes we also 
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need the off-diagonal elements, which can be computed us¬ 
ing the higher expansion coefficients 

{i\(aj - Hr^j) = {i\](co)) - J]<f|/„)c„(w). (43) 

n 

The projections (/!/„) of the vectors (f| with the Lanczos 
basis can be computed and saved when the Lanczos vectors 
are available, thus avoiding the storage of more than the last 
two vectors. As we shall see, the coefficients c„{a>) can be 
computed from continued fractions. An advantage of using 
continued fractions is that one can terminate them in a phys¬ 
ically sensible way which can reduce the number of Lanczos 
vectors one has to explicitly compute. Projecting the Hermi- 
tian transpose of equation (39 1 onto the solution vector Ijiai)) 
gives 

bn+iifn+ilKt^)) = {fn\H\j(u)) - a„{f„\j(aj)) 

(44) 

-f7„</„-il;(w)>. 

Applying the operator H onto the solution vector gives 

H\](aj)} = aj\](aj)) - \]). (45) 


2. Iterative non-TDA BSE 

The full BSE Hamiltonian is non-Hermitian which means 
that the Lanczos procedure outlined above must be modi¬ 
fied. A two-sided Lanczos procedure where both left and right 
eigenvectors are generated in the recursive procedure can be 
used, although it suffers from instability issues due to the loss 
of orthogonality between the Lancz os vectors, often requir¬ 
ing explicit reorthogonalizatiorPSSI p also involves twice 
the number of applications of the Hamiltonian. Recently, a 
pseudo Hermitian algorithm was published that exploits the 
structure of the BSE eigenproblem to convert it into a Hermi¬ 
tian problem in a special scalar producP^. In this algorithm 
one avoids the extra multiplication of the Hamiltonian that is 
present in the two-sided scheme. Below we summarize the 
pseudo-Hermitian algorithm in our notation. 

An operator A is pseudo-HermitiarP^ with respect to the 
invertible Hermitian operator 77 , if 

A = (53) 


which follows directly from the definition of the inverse 
(w - H){aj - H)-^ = 1 


(46) 


together with the definition of the solution vector \j(to)) (37 1 . 
Inserting equation ( 45 1 into equation (44 1 we obtain a recur¬ 
sion relation for the expansion coefficients c„(w) 

bn+iCn+i(co) = coc„(a>) - (5„o - a«c„(w) - b„c„-i(to). (47) 

Eor « = 0 the relation can be rearranged to give 
co(w) -[lo-ao- f7iCi(w)C(,'(w)]^', 
while for n > 0 we obtain 

c„(w)c„)‘j(w)f7^' = [w - a„ - f7„+ic„+i(w)c;'(w)]. 

We now introduce the relaxation functions of order n 

(fioioj) = Co(w) , 

(p„{cj) = c„(w)c„li(w)f7;',n > 0. 


(48) 


(49) 


(50) 


After inserting the expansion coefficients (50i in equations 
( |48| l, ( |49] ) we obtain the continued fraction relations familiar 
from the Haydock recursion scheme 

(finioj) ^[co-a„- bl^yipn+\{tjJ')T^ ■ (51) 

After the relaxation functions have been computed for a cer¬ 
tain frequency, the expansion coefficients c„{to) can be recov¬ 
ered by inverting the relation ( |50| l 

c„(w) = ip„{LS)b„Cn-\{(A) 

= ip„{LS)b„ip„^l{(jj)bn-\ ■ ■ ■ ip\{oj)b\ipo{oj). 

In summary, first the coefficients a„ and bn, as well as the 
needed projections (/!/„) are obtained from equation (39 1 , then 
for each oj (adding a small positive imaginary part, as ap¬ 
propriate for the retarded response), the relaxation functions 
(fnicj) are computed from equations (51 1 using a properly cho¬ 
sen terminator. Then, the expansion coefficients c„(w) are ob¬ 
tained from equation ( [52] l. Einally, the matrix elements are 
computed from equations ([42| and (j4^. 


(52) 


This means that rjA is Hermitian, or equivalently that A is Her¬ 
mitian under the scalar product (-l-)^ = provided that 

the metric 77 is positive definite so that the scalar product is 
well-defined. Eurthermore, the eigenvalues of A are real if 
it is pseudo-Hermitian with respect to an operator that can be 
written like 77 = 00 ^ with O an invertible operatoil^, and such 
a factorization can always be found for a positive definite 77 . 
If A is a product of two Hermitian operators A = BC, then A 
is pseudo-Hermitian with B * and C, which can be checked 
using equation (53 1 . The BSE Hamiltonian given by 


equation ( 221 can be written in matrix form 


//BSE ^ ^0 ^ pf. ^ 


(54) 


with F given by equation (35 1 . Since = / we can write 


fjBSE ^ pfj ^ 


(55) 


where 


H = Ftf + K. (56) 

Since E//*’ is diagonal and real, and .j, it follows 

that H is Hermitian. Erom the preceding discussion it is clear 
that //BSE pseudo-Hermitian with respect to 77 = E * or 
T] - H. Since E is not positive definite it doesn’t serve as a 
metric for a scalar product. H however, should b e positiv e def¬ 
inite unless there exist singlet-triplet instabilitie s^ * | 52 | 62 | 
instabilities do occur for molecules, and especially for triplet 
excitations H can lose its positive definiteness. This will make 
the pseudo-Hermitian algorithm fail. However, since in this 
case also direct diagonalization gives unphysical results one 
should not view this failure as a drawback of the method. 

Within the pseudo-Hermitian Lanczos scheme the same 
steps are followed as in the Hermitian case. The only dif¬ 
ference is that the scalar product is changed form the ordinary 
(j-) to (ji?-), with the Lanczos vectors orthonormal in this 
product. This means that equation ([39]l stays the same, but 
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the Lanczos coefficients are modified to a„ = 

and bn - { which can be seen by multiplying 

equation (391 by and using the orthogonality of the Lanc¬ 


zos vectors in the ( I// ) scalar product. To make the starting 
vector normalized, it is chosen as |/o) = \ j) - 

Due to the metric introduced in our scalar product we effec¬ 
tively have right and left Lanczos vectors, related by \f^} - 
and |/*) = |/„), although only one set of vectors is 
necessary in the actual computation. The resolution of the 
identity in the Lanczos vectors is 

1 = Xi ^ Z = Z ^ 57 ) 


which means that the matrix element of the resolvent must be 
computed as 


<i|(cu-//)-'IJ) = 

n 


(58) 


Here c„ (w) = {fn\H\j(tj)} replaces equation (41 1 — the other 


equations that are needed can be derived as in the Hermitian 
case, only replacing the scalar product. Here, even if we only 
want a diagonal matrix element we have to sum over the pro¬ 
jections of all the Lanczos vectors, because the starting (right) 
vector is not orthogonal (in the ordinary scalar product) to the 
other Lanczos vectors. 


C. Implementation of the Bethe-Salpeter equation 

Having a general description of the BSE and of an iterative 
algorithm for solving it, we will describe our implementation 
using local basis functions. 


{Ljj(r)), the exchange and direct terms in equation (27 1 take 
the following form 


(59) 



Tjdir _ 

^ijM - 


where the bare and screened Coulomb matrix elements in the 
local product basis are 






cf’rd^r'Fl(r)v(r,r')Fy(r'), 


d\d^r'Fl(r)W(r,r',a>^ 0)Fy(r'). 


(60) 


The expansion coefficient V'J of a product of two quasiparticle 
states is given by 


( 61 ) 


II 




a 

b 


where the expansion coefficients are tfiose appearing in 
equation (j^. Unlike tfie local product coefficients Vjf, the 
eigenstate product coefficients V‘J are not sparse and tfie equa¬ 
tions (59 1 will scale like 0{N^) if tfie loops are ordered in the 
proper way as shown by the boxes (equation (61 1 costs OiN'^) 
operations). The singlet transition dipoles can also be calcu¬ 
lated from the product functions 




(62) 


where the dipole moments in the local product basis are D™ 
f d^rF*^(r)r„,. 


1. Non-iterative algorithm 


It is straightforward to compute the matrix in equation (31 1 
and diagonalize it to obtain the four-point polarizability from 
equations ( [24| ) and (25 1 . The matrix elements of the kernel 
K are computed using equation ( |27] i. The construction of 
the matrix requires O(N^) operations {N being the number of 
atoms) and O(N^) memory for storage. Solving the resulting 
eigenvalue problem using standard diagonalization techniques 
gives an even more prohibitive scaling of O(N^) with the num¬ 
ber of atoms. A way to avoid this excessive scaling is to limit 
the number of electron-hole pairs that are included in the cal¬ 
culation. However, the energy range covered by a constant 
number of pairs decreases with increasing system size, lead¬ 
ing to a deteriorated description of the spectrum. In practice, 
the limit where explicit diagonalization is feasible is reached 
for a few tens of atoms: for larger systems iterative schemes 
are more efficient. Nevertfieless, for small systems and for 
testing purposes straightforward diagonalization is a simple 
and useful alternative. Using our localized product basis set 


2. Iterative computation of the BSE 

Let us first look at tfie TDA whicfi is simpler tfian the 
full BSE. Because only the {vc)-sector needs to be solved, 
the eigenvalue problem is Hermitian. Moreover, because 


\D„,} - |D(„) in equation (361, we only need to calculate a 


diagonal matrix element of the resolvent to get the diagonal 
dynamical dipole polarizability 


amm(w) = -{DmlicO - ■ \\D,„f 


= -co(w) • \\D„ 


(63) 


Here \Dm) - \Dm}/\\Dm\\ in equation (34i is used as the 
starting vector in the Lanczos recursion. The dynamical 
dipole polarizability can directly be written as a continued 
fraction using equations ( [50| and (j5T|i: 


CO + ly - Go 




o) ly - ai - 


(64) 
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The Lanczos procedure for TDA is 


l/-i) = 0 , 
l/o) = \b,n), 

= //*"^|/„) - an\fn) - 

b„+l = 

l/«+l) = \fn+l)/bn+l, 


The operation is separated in three steps; first the coefficient 
vector is transformed from the eigenstate basis to the local 
basis 


fO 

Jn 


■ab 




then the kernel K is applied in the local basis 


(65) 


riah _ ra'b' 

J n ~ / j ^ab,a'b'Jn ■■ 


a'b' 


(73) 


(74) 


where first a non-normalized vector |/„+i) is computed and the 
coefficient is computed from its norm. The most time- 
consuming step in computing the Lanczos coefficients is the 
application of the Hamiltonian to a vector. Generally, we ex¬ 
press the Lanczos vector in the |vc), |cv) basis, similarly to the 
BSE eigenvectors in equation (|29]l 


and finally the coefficient vector is back transformed to the 
eigenstate basis 


{vc\K\fn) = 


Z -y* \ ^ y rfClb 

^va ^cbj n 


(75) 


\fn) + (66) 

VC 

with the expansion coefficients 

/r = <v'c|/„), /- = <cv|/„). (67) 


The transform and back-transform can be done in 0{N^) op¬ 
erations since they consist of matrix-matrix multiplications 
which are done sequentially, as shown by the boxes. The ap¬ 
plication of the kernel Kab,a'b' would generally take 0(Nb op¬ 
erations, but due to sparsity it takes actually 0{Nb operations, 
is expressed in the {T)j(r)) basis as 


In the TDA we only make use of the |vc) functions. We want to 
find the expansion coefficients of the vector resulting from the 
application of the Hamiltonian, that is {vc\H^^^\fn} The action 
of non-interacting part is evaluated in 0{N^) operations 

<vc|//Vn) = (fc - e.)/r. ( 68 ) 




ll,V 


(76) 


and the action on the coefficients becomes 

Z HTb.a'b’ fn”' - Z Z Z ci’’' . (77) 

a'b' fi3a,b v a',b'£v 


To exploit the sparsity in the kernels 77“ and we also 
make use of an atomic orbital product basis \ab} with real 
space representation {rr'\ab) - fa(i')f^(r'). Using the expan¬ 
sion of the quasiparticle states in AOs, equation (j^ we have 


For the direct term we similarly get 


rjdir _ 
^ab,a'b' 




y*aa yr yb. 

yyfivyy 


(78) 


\ij)^Yj\^b)Xi,X*,, (69) 

ab 

which allows us to rewrite the kernel as 

Kij,kl = Z K^jhKab,a'b'Xka'Xl,. (70) 

ab,a'b' 


with the matrix elements of the kernel expressed in the AO 
basis 

_i_ Ijdir 

^ah,a'b' ~ J ^ab,a'b' ^ab,a'b' ’ 

Kla'b'- ^d\d\ra{r)fb{r)v{ry)fAr')rb,{{r'), 

^ab.a'b'--f d\d\A:{r)fAr)W{r,r')fb{r')rb,{r'). 

The application of TT to a Lanczos vector becomes 


and the action of the coefficients are 




a'b' 


ab.a'b' 


r 

J n 


ZZ Z''; 

a' ,b' fi3a,a' v3b,b' 




bb' 


fa'k 


(79) 


The Coulomb matrix elements v^y and W^y are given by equa 


tion (601. Because by construction the matrix of product coef¬ 
ficients is sparse, a fixed number of atomic orbitals couple 
for each yu or v and the operations in equations ( |77] i and ( |79| ) 
scale asymptotically as G(A^). 

For the solution of the full BSE problem we use the pseudo- 
Hermitian Lanczos scheme with the scalar product {■\H-) as 
explained in the previous section. A matrix element of the dy¬ 
namical dipole polarizability computed with the iterative al¬ 
gorithm is given by 


Q'mm'(w) = -{D,„\{u) - + \y) 


(80) 


<Vc|7:|/„) = Z Z Z ^y'a'K'b'fn"' ■ (72) 

ab a'b' v'c' 


where the coefficients c^(ai) are computed from the continued 
fractions <p„{a>) as given by equations 0 and ([52|). Note that 
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since we are already computing ofF-diagonal matrix elements 
there is little extra cost to obtain the full dynamical dipole 
polarizability tensor, and not just the diagonal matrix elements 
as is usually done in the TDA case. The Lanczos procedure in 
the pseudo-Hermitian case is 


l/-i) = 0, 
i/o) = 

l/o) = Wo), 
bo = 

l/o) = l/o)/^o, 
l/o) = l^')/^ 0 , 

an = </„'l^’l/„'), 

l/i+l) = ■P’l/„') - a„\fn) - b„\fn-l), 

IfUi) = W«+i), 

bn^i = 

l/«+l) = \fn+l}/bn+l, 

l/«+l) = l/„'+l)/Vl- 


( 81 ) 


In this scheme the intermediate vector I/',,) is saved between 
iterations in order to minimize the number of applications of 
the Hamiltonian. To perform a Lanczos iteration we need to 
apply H given by equation (56 1 to some vector |/„), now con¬ 
taining both particle-hole and hole-particle amplitudes. This 
in done much in the same way as in the TDA case. The term 
Fib is diagonal in the eigenstate basis and becomes 


{vc\F}f\f„) = Y,(FH\cyc'fn‘^' = (ev - a)/^ , 

v'c' 

{cv\Ffb\f„} = = (e, - e,)/" . 


For the coupling matrix elements we transform to the atomic 
basis as in the TDA case, with the exception that we need to 
make use of both |vc) and |cv) vectors. The transformation is 
done for both sets of vectors as shown below 


rab _ 

Jn ~ 




c 

V 


(83) 


After the auxiliary vector in equation ( [83] l has been computed, 
the exchange and direct terms are applied in exactly the same 
way as in the TDA case, and after that the coefficients are 
back-transformed as 


a 

V ffab 
/ j ^cbj n 

b 

a 

h 


(84) 


Finally we need to apply F, which is easily done considering 
its definition ([3^ 


v'c' 

<cv|F|/„)=^F„,,-,,//W-/-. 


(85) 


To conclude, we have shown that the application of the Hamil¬ 
tonian onto a particle-hole state takes 0{N^) operations, both 
when using the Tamm-Dancoff approximation and solving the 
full BSE. If we use the continued fraction method with a given 
broadening we can assume that the number of Lanczos coeffi¬ 
cients will be independenP^of the number of atoms. This then 
leads to an overall 0(N^) complexity scaling of the algorithm. 


III. TEST CALCULATIONS 
A. Simple cases: Na2 and CH4 

As a first test of the implementation we look at two simple 
test systems for which we can make accurate comparisons to 
other codes. The sodium dimer is simple in that it has only 
one valence orbital (filled with two electrons) which makes 
the spectrum dominated by single transitions. We computed 
the Go Wo/BSE for this system starting from an all-electron 
HF ground state calculation performed with our code, using 
the cc-pVDZ Gaussian basis set treated as numerical atomic 
orbitals. The GoWo calculation was performed using fully 
frequency-dependent self energy in the range of the valence 
and semi-core states while the Is core orbitals were treated 
with HF exchange only. The quasiparticle energies were com¬ 
puted using the standard first order expansion of the Re Z„(a;) 
around the initial HF eigenvalue e, according to equation ( [T^ . 
This procedure is less accurate than solving for the quasipar¬ 
ticle energy graphically, but here we are more interested in a 
comparison rather than a fully converged result. The BSE was 
solved by direct diagonaliz ation. E or comparison we use the 
MOLGW code by BrunevaP^I^^Ill where we as far as possible 
use the same parameters as in our code. In table|I]we compare 
the first ionization potential (IP) and electron affinity (EA) as 
well as the position of the first BSE transition obtained with 
the two codes. The agreement is excellent. Furthermore, the 
computed cross sections for both TDA and full BSE match 
very well even for higher transitions — the obtained optical 
spectra lie on top of each other, as can be seen in the figure 
[2 In the calculation of absorption cross section a Lorentzian 
broadening of 0.2 eV was used. 

As a second example we chose to investigate the methane 
molecule, CH 4 . Similarly to the case of sodium dimer, we 
have chosen cc-pVDZ basis in both calculations, 0.2 eV 
Lorenzian broadening and also followed as much as possible 
the same procedure to extract GqWq eigenvalues. In table [n| 
we make the same comparison as in previous example, with 
the same excellent agreement for GW energies, first optical 
excitation in TDA and full BSE. The computed spectra are 
also in perfect agreement, as can be seen in figure 
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TABLE I. Comparison of calculated energies obtained with our code 
and MOLGW cod^^ for the sodium dimer. In both calculations the 
cc-pVDZ basis set is used. Energies are given in units of eV. 


This work MOLGW 


IP (HE) 

4.53 

4.54 

EA (HE) 

-0.14 

-0.13 

Gap (HE) 

4.68 

4.67 

IP (Go Wo) 

4.88 

4.88 

EA(GoWo) 

0.17 

0.18 

Gap (GoWo) 

4.71 

4.70 

BSE (TDA), singlet 

2.29 

2.29 

BSE (full), singlet 

2.04 

2.03 



0 2 4 6 8 10 

CO (eV) 


FIG. 1. Comparson of the absorption cross section of the sodium 
dimer between our method (red full line) and MOLGW (blue dotted 
line). A Lorenzian broadening with FWHM of 0.2 eV was used in 
both cases. 


B. Iterative method versus diagonalization 

Confident that our BSE matrix is set up correctly we now 
turn to the iterative method. As a more suitable test case we 
chose the benzene molecule that is small enough for direct 
diagonalization (with a moderately large basis set) while still 
having many transitions that contribute to the spectrum. The 
ground state calculation was done with the SIESTA codd^ 
using the PBE functional and a DZP basis set, using an en¬ 
ergy shift of 3 meV. Although this basis set is not fully con¬ 
verged for GW quasiparticle energies and optical properties it 
gives reasonable results for the IP (8.85 eV) and EA (-1.34 
eV) compared to earlier obtained resultd^, and to experimen¬ 
tal values^. The first visible optical transition in our calcu¬ 
lations occurs at 6.95 eV for the TDA and 6.18 eV for the 
full BSE, compared to the experimental value of 6.92 eV (ex¬ 
tracted from the experiment shown in ref. l 6 ^ . We note that 
the effect of introducing the TDA here is quite large. A de¬ 
tailed account of the convergence properties of quasiparticle 
energies and BSE spectra for this system, as well as for larger 


TABLE 11. Comparison of calculated energies obtained with our 
method and the MOLGW codd^H for methane. In both calculations 
the cc-pVDZ basis set is used. Energies are given in units of eV. 


This work MOLGW 


IP (HE) 

14.76 

14.77 

EA (HE) 

-5.26 

-5.25 

Gap (HE) 

20.02 

20.02 

IP (Go Wo) 

14.40 

14.41 

EA(GoWo) 

-4.81 

-4.81 

Gap (Go Wo) 

19.22 

19.22 

BSE (TDA), singlet 

12.58 

12.59 

BSE (full), singlet 

12.55 

12.55 



FIG. 2. Comparson of the absorption cross section of CH 4 between 
our method (red full line) and MOLGW (blue dotted line). A Loren¬ 
zian broadening with FWHM of 0.2 eV was used in both cases. 

organic molecules, will appear in a forthcoming publicatiorP^. 
Eor the evaluation of the iterative method, the parameters we 
choose here are fully sufficient. 

In figures and we show the comparison of the itera¬ 
tive method for TDA and non-TDA to direct diagonalization. 
A simple truncation of the continued fraction is used here. 
We see that the converged iterative spectrum is obtained with 
around 200 recursion coefficients for TDA and around 400 
for the non-TDA spectrum for this broadening. Note that the 
full particle-hole space has a dimension of 1400 for TDA and 
2800 for the full BSE. 

Next we look at different terminators of the continued frac¬ 
tion. The last relaxation function in equation (|5T| is assumed 
to satisfy 

ipn-\{(o) = [w - a„-i - blipT{a))T^ (86) 

where the terminator function. The simplest termina¬ 

tor is obtained by truncation, which means that the remaining 
coefficients that are not explicitly computed are set to zero. 
This gives (^^(w) = l/w, and corresponds to a representa¬ 
tion of the dynamical dipole polarizability as a sum of delta 
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0 5 10 15 20 25 30 


CO (eV) 

FIG. 3. The convergence of the trace of the TDA polarizability with 
the number of recursion vectors for benzene. The results obtained 
with 10, 25, 50, 100 and 200 iterations are compared to direct diago- 
nalization of the full BSE Hamiltonian (dashed lines). 



CO (eV) 

FIG. 4. The convergence of the trace of the full BSE polarizability 
with the number of recursion vectors for benzene. The results ob¬ 
tained with 10, 50, 100, 200, 300 and 400 iterations are compared to 
direct diagonalization of the BSE Hamiltonian. 


functions. However, often a more suitable terminator can be 
found by extrapolating the remaining coefficients according to 
some physical model suited to the system of study. In the hrst 
model we consider, the dynamical dipole polarizability is as¬ 
sumed to be a continuous distribution without gap, centered at 
a and with width 2Ew- In this case the a„ coefficien ts sho uld 
converge to a, and b„ should converge to b - At 

convergence, we get the the so called “self-consistent” termi¬ 
nator (SCjSl 

(frico) - [co - a - b^ipT(co)y^, (87) 

which has the solution 

^ ^ oj-a- J(a)- a)^ - 4b^ 

(t’rM = -^^5-, 1°°'’ 

where the negative root was chosen. In the TDA case we only 
look at positive energies, so the dynamical dipole polarizabil¬ 
ity could be approximated (with sufficient broadening) to be a 
continuous distribution where the terminator (|8F| can be used. 


For the full BSE case however, both positive and negative fre¬ 
quencies are explicitly treated. Since the time-ordered polar¬ 
izability (as well as the case without any imaginary conver¬ 
gence factor) is symmetric around w = 0 it has at least two 
distributions separated by a gap. 

The presence of the gap in the middle of the distribution is 
included in the second model we look at. Turchi et al analyzed 
the behavior of the recursion coefficients for densities of states 
with a gap and showed that if 2Eq is the gap (a and Ew dehned 
as before) the coefficients oscillate with limits a± - a±E q, 
and bn with limits (h± = Ew ± Eq)I2W^ The period of the 
oscillations depends on the details of the density of states. If 
no gap is present, we have the situation of equation ( |88| ). For 
a symmetric distribution around the middle of a single gap, 
which could be a good approximation to the full BSE case, 
the period is two, and the terminator is 

iprioj) = [w - a± - 


(89) 
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which has the solution (for the negative root) 


= -p(w)/2 - ^p^(a))/4 - q{u)), 

p(cxj) = - 


(a) - a±)(a> - a^- b\+ b%) 


{u) - a±)b% 


(90) 


q{a)) 


(oj - a±)bl 


We denote this model SC2. Because of symmetry around fre¬ 
quency o) - 0 the a„ coefficients will oscillate around zero 
in the full BSE case. Indeed, since only the odd mome nts of 
the line shape contribute to a„, they should be zercP^I^. How¬ 
ever, in practice, orthogonality between the Lanczos vectors 
will eventually be lost due to numerical errors, and this in¬ 
troduces non-zero values of a„. In practice one can at any 
point in the recursion sequence make the assumption that the 
coefficients have converged and so put in the value of the last 


computed coefficients in equation (881 or (90 1 . Another option 
is to make the assumption that the coefficients will converge 
to the average value of the already computed coefficients, re¬ 
moving some of the bias of the exact point in the chain the 
termination was made. We denote the averaged terminators 
by SC-av and SC2-av when the average is applied for the ter¬ 


minator in ( 881 or in (|9()|) respectively. When a„ is set to zero 


used in refs [20] and 


in the equations dM I, (|90|l, the terminator reduces to the one 


36] (except for the signs of b^ and ^„^j) 


which is appropriate for the full BSE case. The consequence 
of the choice of terminator is illustrated in figures 1^ and 1^ 
for the TDA and full BSE case respectively. The dynamical 
dipole polarizability was computed with 20 and 100 iterations 
for TDA and full BSE correspondingly, while using simple 
truncation, or terminators defined by equations ( |88| l or (j9^, 
with or without averaging. Eor TDA the self-consistent termi¬ 
nator SC gives a slight improvement while SC2 does better, 
although it introduces more broadening. When averaging the 
coefficients we introduce even more broadening in the contin¬ 
uum part of the dynamical dipole polarizability. 

Looking at the a„ and b„ coefficients we see that they do not 
converge but oscillate, which is expected because our small 
basis set cannot give rise to a continuous dynamical dipole 
polarizability in the continuum. Eor an arbitrary stick-like 
distribution the behavior of the coefficients is complicated. 
If we look at the averages of the coefficients, we see that 
(a„) ai 72 eV which is close to the center of the spectrum, 
65 eV, as estimated as half the range of the GW eigenvalues, 
while (bn) ~ 33 eV which is close to a quarter of the range 
of the spectrum, as expected. Averages of the even and odd 
b„ coefficients do not differ almost at all, hence the very simi¬ 
lar appearance of the averaged versions of terminators SC and 
SC2. Using two following b„ coefficients however, preserves 
some oscillations and gives a slightly better agreement to the 
converged spectrum. 

In the non-TDA case, the SC terminator fails completely 
and gives negative intensities. Here it is clear that at least two 
oscillating coefficients must be used for a reasonable descrip¬ 
tion. <a„) was confirmed to be zero, and the averages of the 
odd and even coefficients were seen to be 72 and 64 eV re¬ 
spectively. Their difference (8 eV) should correspond to half 



CO (eV) 


FIG. 5. Comparison of different terminators of the continued fraction 
for the iteratively computed TDA dynamical dipole polarizability of 
benzene. The number of iterations was set to 20. See the text for 
description of the different terminators. 


the gap Eg, roughly 6 eV in our calculations, estimated from 
the GW eigenvalues. Here again, we observe that taking the 
average leads to a smoother spectrum that does not necessar¬ 
ily improve things from only using the last two coefficients. 
This is likely due to the complicated oscillations coming from 
the stick-like distribution obtained with our small basis set. 


C. Demonstration of the low scaling with system size 

To demonstrate the scaling properties of our algorithm we 
performed Lanczos iterations for alkane chains of increasing 
length. One-dimensional systems are the most favorable cases 
for algorithms that make use of sparsity, since the number 
of overlapping functions will be small. To demonstrate the 
asymptotic scaling of our algorithm this system is also ideal 
— the part that scales cubically depends on the number of 
AOs, while the dominant quadratic scaling operations involve 
the number of overlapping AOs. A sparse one-dimensional 
system maximizes the ratio of the former to the latter. The 
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FIG. 6. Comparison of different terminators of the continued fraction 
for the iteratively computed full BSE dynamical dipole polarizability 
of benzene. The number of iterations was set to 100. See the text for 
description of the different terminators. 




FIG. 7. Runtime per Lanczos vector for alkane chains of different 
length, divided by the runtime of the CfAHno chain. The red points 
are the total runtime minus the runtime of the basis transformation, 
the blue the contribution of the basis transformation (denoted cubic). 
Dotted lines have been drawn between the points as a guide for the 
eye. 


terms to dominate completely. We must here stress the fact 
that we have used and almost artificially sparse system in or¬ 
der to demonstrate the cubic scaling of the algorithm. For 
more realistic systems that are less sparse and have more ba¬ 
sis functions per atom, the onset where the cubic terms start to 
dominate will occur much later. We can thus expect that the 
quadratic and lower terms will dominate for systems with up 
to several thousands of basis functions, i.e., for most systems 
that can be practically treated with standard DFT methods. 


IV. CONCLUSIONS 


ground state calculation was done with SIESTA using the 
EDA functional and a minimal SZ basis set. Although scal¬ 
ing like O(N^), our GW scheme turned out to be a bottleneck 
as the systems grow larger, and we therefore chose to bypass 
the GW step and directly do a TDHE benchmark starting from 
EDA eigenstates. Eor the purpose of testing the iterative BSE 
algorithm the choice of starting point makes no difference. In 
figure 1^ we show the runtime, per Lanczos step, or alkane 
chains of different sizes divided by the runtime of the smallest 
chain, C 64 H 130 . The largest alkane chain we considered was 
Ci 024^^2050 with 6146 basis functions. The pseudo-Hermitian 
algorithm was used in this comparison. In the figure the part 
of the runtime coming from the basis transformation in equa¬ 
tions ( |84| i, ( |84| ) that should scale cubically is contrasted to the 
remaining runtime contributions. Eor small systems the ba¬ 
sis transform is negligible in comparison to the other terms 
but due to its cubic asymptotic scaling it will eventually start 
to dominate. We see that for the largest chain considered the 
basis transformation consumes around half the runtime, and 
we would need to go to even larger systems for the cubic 


We have described, and implemented, an iterative scheme 
to compute the optical response of molecular systems at the 
Bethe-Salpeter level, using local basis sets. We go beyond the 
Tamm-Dancoff approximation by an extension of the Hermi- 
tian Haydock recursion scheme to the pseudo-Ffermitian case 
and provide a derivation of this extension. We show that it is 
possible to develop an implementation with low scaling with 
the system size by exploiting the localization of the basis set 
of numerical atomic orbitals. Proof of principle calculations 
are shown, focusing on the case of benzene, and the influence 
of the number of recursion vectors is discussed, as is the ef¬ 
fect of different terminators of the continued fractions on the 
obtained dynamical dipole polarizability. 

The theoretical scaling of our method is 0{N^). How¬ 
ever, calculations performed for alkane chains containing up 
to 1024 carbon atoms shows that the contribution of the cubic 
terms is small. Even for the largest systems considered the 
contribution of the cubic terms is of comparable magnitude to 
to that of the quadratic terms coming from the application of 
the Coulomb kernel in the atomic basis. 
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What we have presented here is a proof of principles of the 
method, plus an analysis of the convergence of our iterative 
BSE scheme. Our final goal, however, is to create a method 
(implemented in a suite of programs) capable of accurately in¬ 
vestigating complex systems containing thousands of atoms. 
In order to reach this goal we are currently investigating ways 
to improve the performance of the method. These include an 
efficient parallelization scheme and a more optimized basis set 
for the expansion of atomic orbital products. It is also the case 
that the GW calculation needed to obtain the quasiparticle en¬ 
ergies and states, as well as the screened interaction matrix 
elements, can benefit from similar improvements. 
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Appendix A: Derivation of the Bethe-Salpeter equation 


where the two-particle Green’s function is defined as 

G(l, 2,3,4) = . (A4) 


Instead of working with the two-particle Green’s function 
we will directly derive the Bethe-Salpeter equation for the 
four-point polarizability L. We will use two relations: the 
chain rule 


d//(3,4) 


/ 


d{56) 


5U[G](l,2)dG[i/](5,6) 
dG(5,6) d//(3,4) 


(A5) 


and a transformation of a derivative of a function to include 
its inverse 


6F{1,2) 

dG(3,4) 


= d(56)F(l,5) 


dF-'(5,6) 

dG(3,4) 


F(6,2)- 


(A6) 


Using equation (A 61 we can write 


dG(l,2) 

617(3,4) 



d(56)G(l,5)G(6,2) 


6G-‘(5,6) 
617(3,4) ■ 


(A7) 


From the Dyson equation for interacting Green’s function G 
we have 


G^'(5,6) = Go'(5,6)-t/(5,6)-VH(5)(5(5,6)-5:(5,6), (A8) 

where we added the external potential t/ to the Hamiltonian 
(it will be set to zero after the derivatives have been taken) and 
the Hartree potential vh is taken outside of the non-interacting 
Green’s function Gq. Evaluating the functional derivative, re¬ 
membering that Go is independent of [/, we get 


Here we derive the BSE equation following an approach 
similar to that given in refs [68] and [^ The purpose of this 
appendix is to derive the equations using our notation to avoid 
possible confusions with different notations and conventions 
that can be found in the literature. 

The reducible two-point polarizability is the response of the 
density to a local perturbation U 


t(1,2) 


6 p(l) _ .(5G(1,1+) 
6U(2) ~ 6U(2) 


(Al) 


where in the ”h-” superscript denotes the addition of a posi¬ 
tive infinitesimal to the time argument. A generalization can 
be made to the nonlocal response of the interacting Green’s 
function G to a nonlocal perturbation, giving the four-point 
polarizability 


L(l,2,3,4) = -i 


6G(1,2) 
617(3,4) ■ 


(A2) 


Comparing equation with ( |A2| i, we conclude that 

X(4,2) - L(\, 1^ 2 ,2). Using the Schwinger functional 
derivative methocPE2) and references therein) the following re¬ 
lation can be proved 


L(l, 2,3,4) = iG(l, 4,2,3) - /G(l, 2)G(4,3), (A3) 


dG-'(5,6) 

6U(3,4) 


= -6(3, 5)d(4,6) - 


= -(5(3,5)5(4,6)- 

5G(7,8) 


•S(5,6)] 


6U(3,4y 


—^[vh(5)5(5,6)+S(5,6)] 


(A9) 


where in the last step we used the chain rule ( |A5| l. Combining 
equations (A7 A9 1 we obtain 

5G(1,2) 


6U(3,4) 


=G(1,3)G(4,2) 


/ 


(f(5678)G(l,5)G(6,2) 


5G(7,8) 


+ 1.(5,6)] 


6G(7,8) 

6G(3,4) 


[vh(5)6(5,6) 


(AlO) 


Defining 

Lo(1,2,3,4) = -iG(l,3)G(4,2), (All) 

X(5,6,7,8) = i—^ [vh(5)5(5,6 ) + S(5,6)] , (A12) 

dCr(7, O) 
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we finally get the Bethe-Salpeter equation 


L(l,2,3,4) =Lo(1,2,3,4) 

+ J d(5678)Lo(l,2,5,6)K(5,6, 7, 8)L(7, 8,3,4). 

(A13) 

Up to this point the derivation has been exact. In order to 
obtain the working expression for the BSE kernel, K, we now 
make use of the GW approximation to the self energy. In this 
case both the Hartree potential vh and the self energy S can be 
expressed in terms of G: 


furthermore only have to worry about the difference t' - t. As 
in the Dyson equation for the Green’s function G, we can then 
Fourier transform to get a dependence of only one frequency, 
thus giving 


L(l,2,3,4|w) = Lo(1,2,3,4|w) 


f 


d(5678)U(\,2, 5,61 a))K{5, 6,7,8 | w)L(7,8,3,41 w). 

(A18) 


Appendix B: Spin strncture of the effective BSE and dependence 
of the occnpations for the polarizability 


/ 


yH(l)= t/(2)v(l,2)p(2) = 


-/ 


ii(2)y(l,2)G(2,2+), 


a. Spin structure 


E(l,2) = iG(l,2)VE(l,2), 


(A14) 

(A15) 


which, neglecting the dependence of W on G, gives 


K(l, 2,3,4) = y(l, 3)^(1,2)5(3,4) - W(l, 2)5(1,3)5(2,4). 

(A16) 

Here we note that the bare Coulomb interaction is instanta¬ 
neous v(l,2) = y(l,2)5(f2 - fi), but this is not in general the 


case for W. Equation (A13 i still depends on four times. For 
our purposes, we want to look at the response at time t from 
a perturbation at time f, that is our perturbations are local in 
time. In this case we can express L in terms of the density 
matrixp(l,2, f) as 


L(l, 2,3,4) = _ _ 

5G(3,4,f3) 


(A17) 


where we identify t — ti and t' — t^. Since the initial time is 
arbitrary for a system in equilibrium — the state of the system 
does not change in time when we are in the ground state — we 


The BSE Hamiltonian can be written in matrix form as 

+ FK (Bl) 

with K - . We assume a singlet closed shell ground 

state so the spatial orbitals are the same for spin up and spin 
down. Explicitly writing out the spin dependence of the or¬ 
bitals as i/f/(l) = i/',(r)xi(cr), and ipi{2) - il/i(r')xi{cr'), where 
the spin wave function x,(cr) can be either a(cr) or /3(cr). Due 
to orthogonality of the spin wave functions we get 

H-i, = - fdVr^;(rmr>W(r. , 

(B2) 

This gives the following structure of the problem in the spin 
indices 


aa 


aa 

a(S 

Pa 


' -H E(H“ -h //'**'■) 
0 


+ E(H“ -H H‘^‘’') 
0 
0 


ap Pa 

0 0 f 

0 0 

0 

0 , 


(B3) 


The upper left 2x2 block can easily be diagonalized to give 


-^{aa+PP) 

-^(aa-pp) 

aP 

Pa 


-^{aa+PP) 
' -H E(2i/“ -H 
0 

0 

0 


-^{aa-pp) ap Pa 

0 0 o' 

df + FH'^''^ 0 0 

0 0 
0 0 


(B4) 
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which leads to one singlet solution, where is included with 
a factor of 2, and three triplet solutions where is absent. 
Knowing this, we work just with the real space quantities, re¬ 
membering to include the correct scaling factor in front of 
depending on if we want a singlet or a triplet solution 




ij,kl 

^triplet _ Lfdir 

A,,,., 


ij,kl 

^triplt 


ij,kl ’ 


(B5) 


For the dipole elements we have 


^m,tiiplet _ Q 


b. Occupation number structure 
The time-ordered four-point polarizability 
Lij,ki(<^) - [(w + iyifi' - fj'))6i'k'Sj>i' - 

can be written in matrix form as 

L{co) = [(w -H iyF)I - . 


From this expression it looks like we have to use all pairs, 
that is not only particle-hole and hole-particle pairs but also 
particle-particle and hole-hole pairs. But actually, only the 
particle-hole and hole-particle pairs contribute to L. To see 
this we set up the and F matrices in blocks correspond¬ 
ing to the (vc), {cv),{vv) and (cc) sectors 




vc 


cv 


VV 

CC 


VC 

' iC + K 

K 


K 

K ' 


^BSE 

-K 


1 

o 

K 

-K 

-K 


VV 

0 


0 


fC 

0 

(B6) 

CC 

. 0 


0 


0 

H^] 




vc 

cv 

VV 

cc 




vc 

' I 

0 

0 

0 ' 


fi). 

F 

_cv 

0 

-I 

0 

0 


(B7) 


VV 

0 

0 

0 

0 




cc 

. 0 

0 

0 

0 , 



(B9) 


(BIO) 


(B8) This gives the following matrix to be inverted in equation 


(u + 'iyF)I - 



vc 

cv 

VV 

cc 

VC 

' {oj + iy)I-{H^ + K) 

-K 

-K 

-K 

cv 

K 

(w - iy)I - (//" - K) 

K 

K 

VV 

0 

0 

O 

1 

3 

0 

cc 

0 

0 

0 

toI-H 


(BID 


The inverse of a matrix with this block structure is 
-1 


A B 
0 D 


A-i -A 'BD ' 
0 D' 


(B12) 


Looking at the polarizability L - [{a> + iyF)I — we 

see that due to the leftmost F matrix only the A-block, that is 
the (vc) and {cv)-sectors of contribute to L. Diagonal¬ 

izing and expanding in left and right eigenvectors gives 
the following expression 


Lij,ki((^) - y^. 

^ ti)— 


A,A’ 


w - Q + ir(/i' - fj) 


(B13) 


r 


Note that this is the time-ordered polarizability, the retarded 
one that we need for the response, that is equation (24 1 , is ob¬ 


tained by setting the sign of the small imaginary part in the 
denominator to always be positive. We can also use the rela¬ 
tions Im LHa)) - sgn(«) Im Uioj) and Re L!((xt) - Re Uiu)), 
where the superscripts ”f” denotes time-ordered and ”r” re¬ 
tarded. 
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